A Semantic Concordance

نویسندگان

  • George A. Miller
  • Claudia Leacock
  • Randee Tengi
  • Ross Bunker
چکیده

A semantic concordance is a textual corpus and a lexicon So combined that every substantive word in the text is linked to its appropriate ~nse in the lexicon. Thus it can be viewed either as a corpus in which words have been tagged syntactically and semantically, or as a lexicon in which example sentences can be found for many definitions. A semantic concordance is being constructed to u s e in studies of sense resolution in context (semantic disambiguation). The Brown Corpus is the text and WordNet is the lexicon. Semantic tags (pointers to WordNet synsets) are inserted in the text manually using an interface, ConText, that was designed to facilitate the task. Another interface supports searches of the tagged text. Some practical uses for semantic concordances are proposed. 1. I N T R O D U C T I O N We wish to propose a new version of an old idea. Lexicographers have traditionally based their work on a corpus of examples taken from approved usage, but considerations of cost usually limit published dictionaries to lexical entries having only a scattering of phrases to illustrate the usages from which definitions were derived. As a consequence of this economic pressure, most dictionaries are relatively weak in providing contextual information: someone learning English as a second language will find in an English dictionary many alternative meanings for a common word, but little or no help in determining the linguistic contexts in which the word can be used to express those different meanings. Today, however, large computer memories are affordable enough that this limitation can be removed; it would now be feasible to publish a dictionary electronically along with all of the citation sentences on which it was based. The resulting combination would be more than a lexicon and more than a corpus; we propose to call it a semantic concordance. If the corpus is some specific text, it is a specific semantic concordance; ff the corpus includes many different texts, it is a universal semantic concordance. We have begun constructing a universal semantic concordance in conjunction with our work on a lexical database. The result can be viewed either as a collection of passages in which words have been tagged syntactically and semantieally, or as a lexicon in which illustrative sentences can be found for many definitions. At the present time, the correlation of a lexical meaning with examples in which a word is used to express that meaning must be done by hand. Manual semantic tagging is tedious; it should be done automatically as soon as it is possible to resolve word senses in context automatically. It is hoped that the manual creation of a semantic concordance will provide an appropriate environment for developing and testing those automatic procedures. 2. W O R D N E T : A L E X I C A L D A T A B A S E The lexical component of the universal semantic concordance that we are constructing is WordNet, an on-line lexical resource inspired by current psycholinguistic theories of haman lexical memory [1, 2]. A standard, handheld dictionary is organized alphabetically; it puts together words that are spelled alike and scatters words with related meanings. Although on-line versions of such standard dictionaries can relieve a user of alphabetical searches, it is clearly inefficient to use a computer merely as a rapid page-turner. WordNet is an example of a more efficient combination of traditional lexicography and modern computer science. The most ambitious feature of WordNet is the attempt to organize lexical information in terms of word meanings, rather than word forms. WordNet is organized by semantic relations (rather than by semantic components) within the open-class categories of noun, verb, adjective, and adverb; closed-class categories of words (pronouns, prepositions, conjunctions, etc.) are not included in WordNet. The semantic relations among open-class words include: synonymy and antonymy (which are semantic relations between words and which are found in all four syntactic categories); hyponymy and hypernymy (which are semantic relations between concepts and which organize nouns into a categorical hierarchy); meronymy and holonymy (which represent part-whole relations among noun concepts); and troponymy (manner relations) and entailment relations between verb concepts. These semantic relations were chosen to be intuitively obvious to nonlinguists and to have broad applicability throughout the lexicon. The basic elements of WordNet are sets of synonyms (or synsets), which are taken to represent lexicalized concepts. A synset is a group of words that are synonymous, in the sense that there are contexts in which they can be interchanged without changing the meaning of the statement. For example, WordNet distinguishes between the synsets:

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Concordance-Based Data-Driven Learning Activities and Learning English Phrasal Verbs in EFL Classrooms

In spite of the highly beneficial applications of corpus linguistics in language pedagogy, it has not found its way into mainstream EFL. The major reasons seem to be the teachers’ lack of training and the unavailability of resources, especially computers in language classes. Phrasal verbs have been shown to be a problematic area of learning English as a foreign language due to their semantic op...

متن کامل

Evaluating concordance with the 1997 World Cancer Research Fund/American Institute of Cancer Research cancer prevention guidelines: challenges for the research community.

Diet, nutritional status and lifestyle practices are significant determinants of the risk of certain cancers. In 1997 The World Cancer Research Fund/American Institute for Cancer Research (WCRF/AICR) developed a series of evidence-based recommendations to help reduce the population and individual risk of cancer. However, guidance for evaluating concordance or compliance with these recommendatio...

متن کامل

SemEval-2015 Task 15: A CPA dictionary-entry-building task

This paper describes the first SemEval task to explore the use of Natural Language Processing systems for building dictionary entries, in the framework of Corpus Pattern Analysis. CPA is a corpus-driven technique which provides tools and resources to identify and represent unambiguously the main semantic patterns in which words are used. Task 15 draws on the Pattern Dictionary of English Verbs ...

متن کامل

P-249: Concordance Rate of Hystrosalpingography and Laparoscopy in Diagnosis of Tubo-Pritoneal Pathology in Infertile Women

Background: Infertility is defined as 1 year of unprotected intercourse without pregnancy. The main causes of infertility are as follows: male factor, Both mail and female, as well as unexplained etiologies. Tubal and peritoneal factors Aceount for 30%-40% of causes of female infertility. Tubal factors include damage and obstraction of the fallopian tube, usually associated with previous pelvic...

متن کامل

Generating an Indoor space routing graph using semantic-geometric method

The development of indoor Location-Based Services faces various challenges that one of which is the method of generating indoor routing graph. Due to the weaknesses of purely geometric methods for generating indoor routing graphs, a semantic-geometric method is proposed to cover the existing gaps in combining the semantic and geometric methods in this study. The proposed method uses the CityGML...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 1993